448 research outputs found

    ODN: Opening the Deep Network for Open-set Action Recognition

    Full text link
    In recent years, the performance of action recognition has been significantly improved with the help of deep neural networks. Most of the existing action recognition works hold the \textit{closed-set} assumption that all action categories are known beforehand while deep networks can be well trained for these categories. However, action recognition in the real world is essentially an \textit{open-set} problem, namely, it is impossible to know all action categories beforehand and consequently infeasible to prepare sufficient training samples for those emerging categories. In this case, applying closed-set recognition methods will definitely lead to unseen-category errors. To address this challenge, we propose the Open Deep Network (ODN) for the open-set action recognition task. Technologically, ODN detects new categories by applying a multi-class triplet thresholding method, and then dynamically reconstructs the classification layer and "opens" the deep network by adding predictors for new categories continually. In order to transfer the learned knowledge to the new category, two novel methods, Emphasis Initialization and Allometry Training, are adopted to initialize and incrementally train the new predictor so that only few samples are needed to fine-tune the model. Extensive experiments show that ODN can effectively detect and recognize new categories with little human intervention, thus applicable to the open-set action recognition tasks in the real world. Moreover, ODN can even achieve comparable performance to some closed-set methods.Comment: 6 pages, 3 figures, ICME 201

    A Convolutional Long Short-Term Memory Neural Network Based Prediction Model

    Get PDF
    In recent years, the market demand for online car-hailing service has expanded dramatically. To satisfy the daily travel needs, it is important to predict the supply and demand of online car-hailing in an accurate manner, and make active scheduling based on the predicted gap between supply and demand. This paper puts forward a novel supply and demand prediction model for online carhailing, which combines the merits of convolutional neural network (CNN) and long short-term memory (LSTM). The proposed model was named convolutional LSTM (C-LSTM). Next, the original data on online car-hailing were processed, and the key features that affect the supply and demand prediction were extracted. After that, the C-LSTM was optimized by the AdaBound algorithm during the training process. Finally, the superiority of the C-LSTM in predicting online car-hailing supply and demand was proved through contrastive experiments

    SODFormer: Streaming Object Detection with Transformer Using Events and Frames

    Full text link
    DAVIS camera, streaming two complementary sensing modalities of asynchronous events and frames, has gradually been used to address major object detection challenges (e.g., fast motion blur and low-light). However, how to effectively leverage rich temporal cues and fuse two heterogeneous visual streams remains a challenging endeavor. To address this challenge, we propose a novel streaming object detector with Transformer, namely SODFormer, which first integrates events and frames to continuously detect objects in an asynchronous manner. Technically, we first build a large-scale multimodal neuromorphic object detection dataset (i.e., PKU-DAVIS-SOD) over 1080.1k manual labels. Then, we design a spatiotemporal Transformer architecture to detect objects via an end-to-end sequence prediction problem, where the novel temporal Transformer module leverages rich temporal cues from two visual streams to improve the detection performance. Finally, an asynchronous attention-based fusion module is proposed to integrate two heterogeneous sensing modalities and take complementary advantages from each end, which can be queried at any time to locate objects and break through the limited output frequency from synchronized frame-based fusion strategies. The results show that the proposed SODFormer outperforms four state-of-the-art methods and our eight baselines by a significant margin. We also show that our unifying framework works well even in cases where the conventional frame-based camera fails, e.g., high-speed motion and low-light conditions. Our dataset and code can be available at https://github.com/dianzl/SODFormer.Comment: 18 pages, 15 figures, in IEEE Transactions on Pattern Analysis and Machine Intelligenc

    Parsing Objects at a Finer Granularity: A Survey

    Full text link
    Fine-grained visual parsing, including fine-grained part segmentation and fine-grained object recognition, has attracted considerable critical attention due to its importance in many real-world applications, e.g., agriculture, remote sensing, and space technologies. Predominant research efforts tackle these fine-grained sub-tasks following different paradigms, while the inherent relations between these tasks are neglected. Moreover, given most of the research remains fragmented, we conduct an in-depth study of the advanced work from a new perspective of learning the part relationship. In this perspective, we first consolidate recent research and benchmark syntheses with new taxonomies. Based on this consolidation, we revisit the universal challenges in fine-grained part segmentation and recognition tasks and propose new solutions by part relationship learning for these important challenges. Furthermore, we conclude several promising lines of research in fine-grained visual parsing for future research.Comment: Survey for fine-grained part segmentation and object recognition; Accepted by Machine Intelligence Research (MIR
    • …
    corecore